% Appendix D - Duplicate Annotation in CiteULike



\chapter{Duplicate Annotation in CiteULike}
\label{app:duplicate-detection}
\lhead{Appendix D. \emph{Duplicate Annotation in CiteULike}}

  \begin{figure}[h]
    \centering
    \includegraphics[scale=0.38,viewport=520 30 550 820]{./figures/screenshot-deduplication.pdf}
    \caption[Screenshot of the interface for judging duplicate CiteULike pairs]{Screenshot of the interface for judging duplicate CiteULike pairs. The example pair in the screenshot is a typical example of a duplicate pair. For each article in the pair, we display the hyperlinked article ID pointing to the CiteULike article page, the article title, the publication year, and the author(s).}
    \label{fig:screenshot-deduplication}
  \end{figure}
  
After obtaining a training set of 2,777 pairs as described in Subsection \ref{9:subsec:creating-training-set}, we needed to determine which of these pairs were duplicates and which were different items. Figure \ref{fig:screenshot-deduplication} shows the simple interface we created for duplicate annotation. The seed item is shown at the top with its item ID, title, year, and author metadata and the same metadata fields are shown at the bottom of the screen for the matched item. The annotator can then judge the articles to be `the same' or `different'. The article IDs are linked to their \cul\ article pages for easy reference.




